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We study a multi-server model with n flexible servers and n queues, connected through a bipartite graph, 
where the level of flexibility is captured by an upper bound on the graph’s average degree, ci„. Applications 
in content replication in data centers, skill-based routing in call centers, and flexible supply chains are among 
our main motivations. 

We focus on the scaling regime where the system size n tends to infinity, while the overall traffic intensity 
stays fixed. We show that a large capacity region and an asymptotically vanishing queueing delay are 
simultaneously achievable even under limited flexibility (d„ <C n). Our main results demonstrate that, when 
d„ ^ Inn, a family of expander-graph-based flexibility architectures has a capacity region that is within a 
constant factor of the maximum possible, while simultaneously ensuring a diminishing queueing delay for 
all arrival rate vectors in the capacity region. Our analysis is centered around a new class of virtual-queue- 
based scheduling policies that rely on dynamically constructed job-to-server assignments on the connectivity 
graph. For comparison, we also analyze a natural family of modular architectures, which is simpler but has 
provably weaker performance. * 
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1. Introduction 

At the heart of a number of modern queueing networks lies the problem of allocating processing 
resources (manufacturing plants, web servers, or call-center staff) to meet multiple types of demands 
that arrive dynamically over time (orders, data queries, or customer inquiries). It is usually the case 
that a fully flexible or completely resource-pooled system, where every unit of processing resource 
is capable of serving all types of demands, delivers the best possible performance. Our inquiry is, 
however, motivated by the unfortunate reality that such full flexibility is often infeasible due to 
overwhelming implementation costs (in the case of a data center) or human skill limitations (in 
the case of a sk i ll-based call center). 

What are the key benefits of flexibility and resource pooling in such queueing networks? Can 
we harness the same benefits even when the degree of flexibility is limited., and how should the 

*May 2015; revised October 2016. A preliminary version of this paper appeared at Sigmetrics 2013, [30]; the perfor¬ 
mance of the architectures proposed in the current paper is significantly better than the one in EOj. This research 
was supported in part by the NSF under grant CMMI-1234062. 
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network be designed and operated? These are the main questions that we wish to address. While 
these questions can be approached from a few different angles, we will focus on the metrics of 
capacity region and expected queueing delay, the former measures the system’s robustness against 
demand uneertainties, i.e., when the arrival rates for different demand types are unknown or likely 
to fluctuate over time, while the latter is a direct reflection of performance. Our main message is 
positive: in the regime where the system size is large, improvements in both the capacity region 
and delay are jointly aehievable even under very limited flexibility, given a proper choice of the 
architecture (interconnection topology) and scheduling policy. 




Figure 1 Extreme cases of flexibility: dn = n versus = 1. 


Benefits of Full Flexibility. We begin by illustrating the benefits of flexibility and resource 
pooling in a very simple setting. Consider a system of n servers, each running at rate 1, and n 
queues, where each queue stores jobs of a particular demand type. For each i £ {1,... ,n}, queue 
i receives an independent Poisson arrival stream of rate A^. The average arrival rate 
denoted by p, and is referred to as the traffic intensity. The sizes of all jobs are independent and 
exponentially distributed with mean 1. 

For the remainder of this paper, we will use a measure of flexibility given by the average number 
of servers that a demand type can receive service from, denoted by d„. Let us consider the two 
extreme cases: a fully flexible system, with dn = n (Figure [^a)), and an inflexible system, with 
dn = 1 (Figure [^b)). Fixing the traffic intensity p < 1, and letting the system size, n, tend to 
infinity, we observe the following qualitative benefits of full flexibility: 

1. Large Capacity Region. In the fully flexible case and under any work-conserving scheduling 
polic30 the collection of all jobs in the system evolves as an M/M/n queue, with arrival rate 
Yl’i=i service rate n. It is easy to see that the system is stable for all arrival rates that satisfy 

Yl'i=i contrast, in the inflexible system, since all M/M/1 queues operate independently, 

we must have < 1, for all i, in order to achieve stability. Comparing the two, we see that the 

^ A work-conserving policy mandates that a server be always busy whenever there is at least one job in some queue 
to which it is connected. 
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fully flexible system has a much larger capacity region, and is hence more robust to uncertainties 
or changes in the arrival rates. 

2. Diminishing Delay. Let W be the steady-state expected waiting time in queue (time from 
entering the queue to the initiation of service). As mentioned earlier, the total number jobs in 
the system for the fully flexible case evolves as an M/M/n queue with traffic intensity p <1. It 
is not difficult to verify that for any fixed value of p, the expected total number of jobs in the 
queues is bounded above by a constant independent of n, and hence the expected waiting time in 
queue satisfies E(VL)—)-0, as n— )-oo0In contrast, the inffexible system is simply a collection of 
n independent M/M/1 queues, and hence the expected waiting time is E(bL) = > 0, for all 

n. Thus, the expected delay in the fully flexible system vanishes asymptotically as the system size 
increases, but stays bounded away from zero in the inflexible system. 

Preview of Main Results. Will the above benefits of fully flexible systems continue to be 
present if the system only has limited flexibiltiy, that is, if dn n? The main results of this paper 
show that a large capacity region and an asymptotically vanishing delay can still be simultaneously 
achieved, even when <C n. However, when flexibility is limited, the architecture and scheduling 
policy need be chosen with care. We show that, when S> Inn, a family of expander-graph-based 
flexibility architectures has the largest possible capacity region, up to a constant factor, while 
simultaneously ensuring a diminishing queueing delay, of order lnn/(i„ as n—)-oo, for all arrival 
rate vectors in the capacity region (Theorem |3.4[ ). For comparison, we also analyze a natural family 
of modular architectures, which is simpler but has provably weaker performance (Theorems |3.5| 


and 3.6). 


1.1. Motivating Applications 

We describe here several motivating applications for our model; Figure illustrates the overall 
architecture that they share. Content replication is commonly used in data centers for bandwidth 
intensive operations such as database queries m or video streaming [20] . by hosting the same 
piece of content on multiple servers. Here, a server corresponds to a physical machine in the data 
center, and each queue stores incoming demands for a particular piece of content (e.g., a video 
clip). A server j is connected to queue i if there is a copy of content i on server j, and dn reflects 
the average number of replicas per piece of content across the network. Similar structures also arise 
in skill-based routing in call centers, where agents (servers) are assigned to answer calls from 
different categories (queues) based on their domains of expertise [SS]) and in process-flexible 

^ The fact that the expected waiting time vanishes asymptotically follows from the bounded expected total number 
of jobs in steady-state, the assumption that the total arrival rate is pn, which goes to infinity as n —^ oo, and Little’s 
Law. 
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Figure 2 A processing network with rn queues and n servers. 

supply chains [HI [23 ISl IISI [TU] , where each plant (server) is capable of producing multiple 
product types (queues). In many of these applications, demand rates can be unpredictable and may 
change significantly over time; for instance, unexpected “spikes” in demand traffic are common 
in modern data centers m- These demand uncertainties make robustness an important criterion 
for system design. These practical concerns have been our primary motivation for studying the 
interplay between robustness, performance, and the level of flexibility. 

1.2. Related Research 

Bipartite graphs provide a natural model for capturing the relationships between demand types 
and service resources. It is well known in the supply chain literature that limited flexibility, corre¬ 
sponding to a sparse bipartite graph, can be surprisingly effective in resource allocation even when 
compared to a fully flexible system HSldnillSlElES!. The use of sparse random graphs or expanders 
as flexibility structures to improve robustness has recently been studied in mm in the context 
of supply chains, and in m for content replication. Similar to the robustness results reported in 
this paper, these works show that random graphs or expanders can accommodate a large set of 
demand rates. However, in contrast to our work, nearly all analytical results in this literature focus 
on static allocation problems, where one tries to match supply with demand in a single shot, as 
opposed to our model, where resource allocation decisions need to be made dynamically over time. 

In the queueing theory literature, the models that we consider fall under the umbrella of multi¬ 
class multi-server systems, where a set of servers are connected to a set of queues through a bipartite 
graph. Under these (and similar) settings, complete resource pooling (full flexibility) is known to 
improve system performance 12IIII21I3]. However, much less is known when only limited flexibility 
is available: systems with a non-trivial connectivity graph are extremely difficult to analyze, even 
under seemingly simple scheduling policies (e.g, first-come first-serve) [23131]. Simulations in |32j 
show empirically that limited cross-training can be highly effective in a large call center under 
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a skill-based routing algorithm. Using a very different set of modeling assumptions, [2] proposes 
a specific chaining structure with limited flexibility, which is shown to perform well under heavy 
traffic. Closer to the spirit of the current work is |29j . which studies a partially flexible system 
where a fraction p > 0 of all processing resources are fully flexible, while the remaining fraction, 
1 — p, is dedicated to specific demand types, and which shows an exponential improvement in delay 
scaling under heavy-traffic. However, both [2] and |29j focus on the heavy-traffic regime, which is 
different from the current setting where traffic intensity is assumed to be fixed, and the analytical 
results in both works apply only to uniform demand rates. Furthermore, with a constant fraction 
of the resources being fully flexible, the average degree in [29] scales linearly with the system size 
n, whereas here we are interested in the case of a much slower (sub-linear) degree scaling. 

At a higher level, our work is focused on the interplay between robustness, delay, and the degree 
of flexibility in a queueing network, which is much less studied in the existing literature, and 
especially for networks with a non-trivial interconnection topology. 

On the technical end, we build on several existing ideas. The techniques of batching (cf. [241125] i 
and the use of virtual queues (cf. [221 |T^) have appeared in many contexts in queueing theory, 
but the specific models considered in the literature bear little resemblance to ours. The study of 
expander graphs has become a rich field in mathematics (cf. |14jl. but we will refrain from providing 
a thorough review because only some elementary and standard properties of expander graphs are 
used in the current paper. 

We finally note that preliminary (and weaker) versions of some of the results were included in 
the conference paper [301- 

Organization of the Paper. We describe the model in Section along with the notation 
to be used throughout. The main results are provided in Section The construction and the 
analysis associated with the Expander architecture will be presented separately, in Section]^ We 
conclude the paper in Section with a further discussion of the results as well as directions for 
future research. 

2. Model and Metrics 

2.1. Queueing Model and Interconnection Topologies 

The Model. We consider a sequence of systems operating in continuous time, indexed by the 
integer n, where the reth system consists of rn queues and n servers (Figure]^, and where r is a 
constant that is held fixed as n varies. For simplicity, we will set r to 1 but note that all results 
and arguments in this paper can be extended to the case of general r without difficulty. 

A flexible architecture is represented by an n x n undirected bipartite graph = (£’,IU J), 
where I and J represent the sets of queues and servers, respectively, and E the set of edges between 
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themWe will also refer to I and J as the sets of left and right nodes, respectively. A server j ^ J 
is capable of serving a queue z G /, if and only if {i,j) G E. We will use the following notation. 

1. Let Gn be the set of all n x n bipartite graphs. 

2. For pn G Gn, let deg( 5 „) be the average degree among the n left nodes, which is the same as 
the average degree of the right nodes. 

3. For a subset of nodes, M C lUJ, let ^Im be the graph induced by g on the nodes in M. 

4. Denote by Af (i) the set of servers in J connected to queue z, and similarly, by M {j) the set 
of queues in I connected to server j. 

Each queue z receives a stream of incoming jobs according to a Poisson process of rate A„_i, 
independent of all other streams, and we define A„ = (A„^i, A„^ 2 j • • •; A„_„), which is the arrival rate 
vector. When the value of n is clear from the context, we sometimes suppress the subscript rz and 
write A = (Ai,...,A„) instead. The sizes of the jobs are exponentially distributed with mean 1, 
independent from each other and from the arrival processes. All servers are assumed to be running 
at a constant rate of 1. The system is assumed to be empty at time t = 0. 

Jobs arriving at queue z can be assigned (immediately, or in the future) to an idle server j G AA (z) 
to receive service. The assignment is binding: once the assignment is made, the job cannot be 
transferred to, or simultaneously receive service from, any other server. Moreover, service is non- 
preemptive: once service is initiated for a job, the assigned server has to dedicate its full capacity 
to this job until its completionj^ Formally, if a server j has just completed the service of a previous 
job at time t or is idle, its available actions are: (a) Serve a new job: Server j can choose to 
fetch a job from any queue in Af (j) and immediately start service. The server will remain occupied 
and take no other actions until the processing of the current job is completed, which will take an 
amount of time that is equal to the size of the job. (b) Remain idle: Server j can choose to 
remain idle. While in the idling state, it will be allowed to initiate a service (Action (a)) at any 
point in time. 

Given the limited set of actions available to the server, the performance of the system is fully 
determined by a scheduling policy, vr, which specifies for each server j G J, (a) when to remain idle, 
and when to serve a new job, and (b) from which queue in Af (j) to fetch a job when initiating 
a new service. We only allow policies that are causal, in the sense that the decision at time t 
depends only on the history of the system (arrivals and service completions) up to t. We allow the 

^ For simplicity of notation, we omit the dependence of E, I, and J on n. 

^ While we restrict to binding and non-preemptive scheduling policies, other common architectures where (a) a server 
can serve multiple jobs concurrently (processor sharing), (b) a job can be served by multiple servers concurrently, 
or (c) job sizes are revealed upon entering the system, are clearly more powerful than the current setting, and are 
therefore capable of implementing the scheduling policies considered in this paper. As a result, the performance upper 
bounds developed in this paper also apply to these more powerful variations. 
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scheduling policy to be centralized (i.e., to have full control over all server actions) based on the 
knowledge of all queue lengths and server states. On the other hand, the policy does not observe 
the actual sizes of the jobs before they are served. 

2.2. Performance Metrics 

Characterization of Arrival Rates. We will restrict ourselves to arrival rate vectors with 
average traffic intensity at most p, i.e., 

n 

<pn, (1) 

where p G (0,1) will be treated throughout the paper as a given absolute constant. To quantify 
the level of variability or uncertainty of a set of arrival rate vectors. A, we introduce a fluctuation 
parameter, denoted by u„, with the property that Xi <Un, for all i and A G A. 

Note that, for a graph with maximum degree d„, the fluctuation parameter should not exceed dn, 
because otherwise there could exist some A G A under which at least one queue would be unstable. 
Therefore, the best we can hope for is a flexible architecture that can accommodate arrival rate 
vectors with a rt„ that is close to d„. The following condition formally characterizes the range of 
arrival rate vectors we will be interested in, parameterized by the fluctuation parameter, and 
traffic intensity, p. 

Condition 2.1 (Rate Condition) Fix n>l and some > 0. We say that a (non-negative) 
arrival rate vector A satisfies the rate condition if the following hold: 

1. maxx<2<7i Aj ^ u^i . 

2 - 

We denote by Affun) the set of all arrival rate vectors that satisfy the above conditions. 

Capacity Region. The capacity region for a given architecture is defined as the set of all arrival 
rate vectors that it can handle. As mentioned in the Introduction, a larger capacity region indicates 
that the architecture is more robust against uncertainties or changes in the arrival rates. More 
formally, we have the following definition. 

Definition 2.2 (Feasible Demands and Capacity Region) Let g = {I L) J, E) be an n x n' 

bipartite graph. An arrival rate vector A = (Ai,..., A„), is said to be feasible if there exists a flow, 
f = {fij : {i,j) G E}, such that 

Xi ^ ^ fiji yi(^i, 

Y, ^<1, VjGJ, 

iGAfU) 


fij>0, \/{i,j)eE. 


(2) 
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In this case, we say that the flow f satisfies the demand X. The capacity region of g, denoted by 
is defined as the set of all feasible demand vectors of g. 


It is well known that there exists a policy under which the steady-state expected delay is finite if 


and only if A G R( 5 f„); the strict inequalities in Definition 2.2 are important here. For the remainder 


of the paper, we will use the fluctuation parameter (cf. Condition 2.1) to gauge the size of the 
capacity region, R{g n), of an architecture. For instance, if An{un) C R( 5 „), then the architecture 
5 „, together with a suitable scheduling policy, allows for finite steady-state expected delay, for any 
arrival rate vetor in A„(n„). 

Vanishing Delay. We define the expected average delay, E(hF|A,5f,7r) under the arrival rate 
vector A, flexible architecture g, and scheduling policy vr, as follows. We denote by the waiting 
time in queue experienced by the mth job arriving to queue i, define 


E {Wfl = limsupE {Wi^A , 

m—>-oo 

and let 

E(W|A,ff,7r) = ^^^ J]A.E(W.). (3) 

In the sequel, we will often omit the mention of vr, and sometimes of g, and write E(1F| A,^) or 
E iW I A), in order to place emphasis on the dependencies that we wish to focus onj^ 

The delay performance of the system is measured by the following criteria: (a) for what 
ranges A„(u„) of arrival rates. A, does delay diminish to zero as the system size increases, i.e., 
sup_^gA„(«„) ® I A) — )■ 0 as n — >• oo, and (b) at what speed does the delay diminish, as a function 
of re? 


2.3. Notation 

We will denote by N, Z+, and M+, the sets of natural numbers, non-negative integers, and non¬ 
negative reals, respectively. The following short-hand notation for asymptotic comparisons will be 
used often, as an alternative to the usual O (■) notation; here / and g are positive functions, and 
L is a certain limiting value of interest, in the set of extended reals, MU {—oo, -|-oo}: 

1- f{x) < g{x) or g{x) > f{x) for limsup,,^i/(x)/c/(x) < oo; 

2. fix) <C gix) or g{x) > f{x) for limsup,,^i/(x)/ 5 (x) = 0; 

3. fix) ~ gix) for lim^^^i fix)/gix) = 1. 

We will minimize the use of floors and ceilings, to avoid the cluttering of notation, and thus 
assume that all values of interest are appropriately rounded up or down to an integer, whenever 
doing so does not cause ambiguity or confusion. Whenever suitable, we will use upper-case letters 
for random variables, and lower-case letters for deterministic values. 

® Note that E(W|A) captures a worst-case expected waiting time across all jobs in the long run, and is always well 
dehned, even under scheduling policies that do not induce a steady-state distribution. 
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3. Main Results: Capacity Region and Delay of Flexible Architectures 

The statements of our main results are given in this section. Below is a high-level summary of our 
results; a more complete comparison is given in Table 


Flexible architectures 

Rate Conditions 

Capacity Region 

Delay 

Expander 
(Theorem |3.4|) 

dn > Inn, 

^ dfi 

Good for all A 

Good for all A, with 
E(W)<lnn/(i„, 

Modular 
(Theorems |3.5[ |3.7[) 

dn > 1, 

Bad for some 

A (even if < 1) 

Good for uniform A, with 
E (W) < exp(—c • dn) 

Random Modular 

(w.h.p.) 

(Theorems |3.6[ |3.7[) 

>lnn, 

Un<dn/lnn 

Good for most A, 
Bad for some A 

Good for most A, with 
E(lT)<exp(-c-d„), 

Bad for some A 


Table 1 

This table summarizes and compares the flexibility architectures that we study, in terms of of capacity and delay. 
We say that capacity is “good” for A if A falls within the capacity region of the architecture, and that delay is 
“good” if the expected delay is vanishingly small for large n. When describing the size of the set of A for which a 
statement applies, we use the following (progressively weaker) quantifiers: 

1. “For all” means that the statement holds for all A £ A„(u„); 

2. “For most” means that the statement holds with high probability when A is drawn from an arbitrary 
distribution over A„(u„), independently from any randomization in the construction of the flexibility architecture; 

3. “For some” means that the statement is true for a non-empty set of values of A. 

The label “w.h.p.” means that all statements in the corresponding row hold with high probability with respect to 
the randomness in generating the flexibility architecture. 


Our main results focus on an Expander architecture, where the interconnection topology is 
an expander graph with appropriate expansion. We show that, when S> Inn, the Expander 
architecture has a capacity region that is within a constant factor of the maximum possible among 
all graphs with average degree (i„, while simultaneously ensuring an asymptotically diminishing 
queueing delay of order lnn/d„ for all arrival rate vectors in the capacity region, as n —)> oo (Theorem 


3.4). Our analysis involves on a new class of virtual-queue-based scheduling policies that rely on 


dynamically constructed job-to-server assignments on the connectivity graph. 

Our secondary results concern a Modular architecture, which has a simpler construction and 
scheduling rule compared to the Expander architecture. The Modular architecture consists of a 
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collection of separate smaller subnetworks, with complete connectivity between all queues and 
servers within each subnetwork. Since the subnetworks are disconnected from each other, a Modular 
architecture does not admit a large capacity region: there always exists an infeasible arrival rate 
vector even when the fluctuation parameter is of constant order (Theorem |3.5[ ). Nevertheless, we 
show that with proper randomization in the construction of the subnetworks (Randomized Modular 
architecture), a simple greedy scheduling policy is able to deliver asymptotically vanishing delay 
for “most” arrival rate vectors with nearly optimal fluctuation parameters, with high probability 


(Theorem 3.6). These findings suggest that, thanks to its simplicity, the Randomized Modular 
architecture could be a viable alternative to the Expander architecture if the robustness requirement 
is not as stringent and one is content with probabilistic guarantees on system stability. 


3.1. Preliminaries 

Before proceeding, we provide some information on expander graphs, which will be used in some 
of our constructions and proofs. 


Definition 3.1 An n x n' bipartite graph (I U J,E) is an {a^(5)-expander, if for all S C I that 
satisfy l^l < an, we have that |AA(5') | > (d\S\, where M {S) = nodes in J 

that are connected to some node in S. 


The usefulness of expanders in our context comes from the following lemma, which relates the 
parameters of an expander to the size of its capacity region, as measured by the fluctuation param¬ 


eter, Un- The proof is elementary and is given in Appendix A.l 


Lemma 3.2 (Capacity of Expanders) Fix n,n' G N, p £ (0,1), j > p- Suppose that an nx n' 
bipartite graph, g^, is a ( 7 // 3 „,/?„)-expander, where fin >Un- Then A„(u„) C R( 5 „). 

The following lemma ensures that such expander graphs exist for the range of parameters that 
we are interested in. The lemma is a simple consequence of a standard result on the existence of 
expander graphs, and its proof is given in Appendix ]A. 2 [ 

Lemma 3.3 Fix p £ (0, 1). Suppose that d„ —>• oo as oo. Let (dn = dn, and 7 = ,/p. 

There exists n' > 0, such that for all n > n', there exists an n x n bipartite graph which is a 
{'y/(3n , I3n)-expander with maximum degree d„. 

Remark. It is well known that random graphs with appropriate average degree are expanders 
with high probability (cf. [Hj). For instance, it is not difficult to show that if dni$>lnn and /3„ = 
^^dn/lnn, then an Erdos-Renyi random bipartite graph with average degree dn is a ( 7 // 3 „,/ 3 „)- 
expander, with high probability, as n —)• 00 (cf. Lemma 3.12 of [33] )• We note, however, that to 
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deterministically construct expanders in a computationally efficient manner can be challenging and 
is in and of itself an active field of research; the reader is referred to the survey paper m and the 
references therein. 


3.2. Expander Architecture 

Construction of the Architecture. The connectivity graph in the Expander Architecture is 
an expander graph with maximum degree and appropriate expansion. 

Scheduling Policy. We employ a scheduling policy that organizes the arrivals into batches, 
stores the batches in a virtual queue, and dynamically assigns the jobs in a batch to appropriate 


servers. Theorem 3.4, which is the main result of this paper, shows that under this policy the 
Expander architecture achieves an asymptotically vanishing delay for all arrival rate vectors in the 
set A„(ti„). Of course we assume that is sufficiently large so that the corresponding expander 
graph exists (Lemma |3.3[ with p replaced with p). At a high level, the strong guarantees stem from 
the excellent connectivity of an expander graph, and similarly of random subsets of an expander 
graph, a fact which we will exploit to show that jobs arriving to the system during a small time 
interval can be quickly assigned to connected idle servers with high probability, which then leads to 
a small delay. The proof of the theorem, including a detailed description of the scheduling policy, 
is given in Section]^ 


Theorem 3.4 (Capacity aud Delay of Expauder Architectures) Let 

every n G N, define 


fin 


1 ln(l/p) 

2 ■ln(l/p) + l 


P 


and 


7 = VP- 


1 

l+(l-p)/8- 


For 


Suppose that Inn <C n, and 

— 2 fi'^ ■ 

Let Qn he a {'y/fin, fin)-expander with maximum degree dn- The following holds. 

1. There exists a scheduling policy, -Kn, under which 

cItI 7? 

sup E(W|A„,p„)<—, (4) 

Xn£A„(u„) Un 

where c is a constant independent of n and 

2. The scheduling policy, 7r„; only depends on and an upper bound on the traffic intensity, p. 
It does not require knowledge of the arrival rate vector A„. 
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Note that when p is viewed as a constant, the upper bound on Un in the statement of Theorem 


3.4 is just a constant multiple of dn- Since the fluctuation parameter, Un, should be no more than 


dn for stability to be possible, the size of A„(u„) in Theorem 3.4 is within a constant factor of the 
best possible. 

Remark. Compared to our earlier results, in a preliminary version of this paper (Theorem 1 in 


m ), Theorem |3.4| IS stronger in two major aspects: (1) the guarantee for diminishing delay holds 
deterministically over all arrival rate vectors in A„(ti„), as opposed to “with high probability” 
over the randomness in the generation of (?„, and (2) the fluctuation parameter, u„, is allowed to 

while m required that <C d^/ Inn. The flexible architecture 


be of order d„ in Theorem 


3.4 


considered in m was based on Erdbs-Renyi random graphs. It also employed a scheduling policy 
based on virtual queues, as in this paper. However, the policy in the present paper is simpler to 
describe and analyze. 


3.3. Modular Architectures 

In a Modular architecture, the designer partitions the network into n/d„ separate subnetworks. 
Each subnetwork consists of dn queues and servers that are fully connected (Figure]^, but discon¬ 
nected from queues and servers in other subnetworks. 

Construction of the Architecture. Formally, the construction is as follows. 

1. We partition the set J of servers into n/dn disjoint subsets (“clusters”) Hi,..., all 

having the same cardinality d„. For concreteness, we assign the first dn servers to the first 
cluster. Hi, the next servers to the second cluster, etc. 

2. We form a partition (T„ = (Ai,..., of the set I of queues into n/dn disjoint subsets 

(“clusters”) all having the same cardinality dn- 

3. To construct the interconnection topology, for k = 1, ... ,n/dn, we connect every queue i £ 

to every server j £ B^.. A pair of queue and server clusters with the same index k will be 
referred to as a subnetwork. 

Note that in a Modular architecture, the degree of each node is equal to the size, d„, of the 
clusters. Note also that different choices of cj„ yield isomorphic architectures. When (j„ is drawn 
uniformly at random from the set of all possible partitions of I into subsets of size n/dn, we call 
the resulting topology a Random Modular architecture. 

Scheduling Policy. We use a simple greedy policy, equivalent to running each subnetwork as 
an M/M/dn queue. Whenever a server j £ Bj, becomes available, it starts serving a job from any 
non-empty queue in A^. Similarly, when a job arrives at queue i £ A^, it is immediately assigned 
to an arbitrary idle server in B^^ if such a server exists, and waits in quene i, otherwise. 
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Figure 3 A Modular architecture consisting of n/d„ subnetworks, each with d„ queues and servers. Within each 
subnetwork, all servers are connected to all queues. 


Our first result points out that a Modular architecture does not have a large capacity region: 
for any partition cj„, there always exists an infeasible arrival rate vector, even if is small, of 


order 0(1). The proof is given in Appendix A.3 Note that this is a negative result that applies no 
matter what scheduling policy is used. 


Theorem 3.5 (Capacity Region of Deterministic Modular Architectures) Fix n > 1 and 

some u„ > 1. Let gn be a Modular architecture with average degree dn < ^n. Then, there exists 
An G An(rin) such that A^ ^ 

However, if we are willing to settle for a weaker result on the capacity region, the next theorem 
states that with the Random Modular architecture, any given arrival rate vector A„ has high 
probability (with respect to the random choice of the partition a^) of belonging to the capacity 
region, if the fluctuation parameter, is of order C)(d„/lnn), but no more than that. Intuitively, 
this is because the randomization in the connectivity structure makes it unlikely that many large 
components of A„ reside in the same sub-network. The proof is given in Appendix ]A. 4[ 

Theorem 3.6 (Capacity Region of Random Modular Architectures) Let (7„ be drawn 
uniformly at random from the set of all partitions, and let Gn be the resulting Random Modular 
architecture. LetFc^ be the probability measure that describes the distribution ofGn- Fix a constant 
Cl > 0, and suppose that > Cilnre. Then, there exist positive constants c^ and C 3 , such that: 

(a) If Un < C 2 dn/Inn, then 

lim inf (A„ G R(G„)) = 1. (5) 

n ^00 A/j, G Att, (utt, ) 

(b) Conversely, if Un > Csd^/lnn and dn < n^'^, then 


lim inf 

n >-00 


Tg„ (An G 


R(Gn))=0, 


( 6 ) 

























Tsitsiklis and Xu: Flexible Queueing Architectures 


14 


We can use Theorem 3.6 to obtain a statement about “most” arrival rate vectors in 
as follows. Suppose that A„ is drawn from an arbitrary distribution over A„(u„), indepen¬ 
dently from the randomness in Let Pg„ x be the product measure that describes the joint 
distribution of and A„. Using Fubini’s theorem, Eq. ([^ implies that 


lim (Pon X ir„)(A„ G R(G„)) = 1. (7) 

n—>-oo 

A further application of Fubini’s theorem and an elementary argumenlj^ implies that there exists 
a sequence 5'^ that converges to zero, such that the event 


/^ra(A„ G R(G„) I Gn) > 1 — (5^ 


( 8 ) 


has “high probability,” with respect to the measure Pg„- That is, there is high probability that the 
Random Modular architecture includes “most” arrival vectors A„ in A„(u„). 

We now turn to delay. The next theorem states that in a Modular architecture, delay is van¬ 
ishingly small for all arrival rate vectors in the capacity region that are not too close to its outer 
boundary. The proof is given in Appendix ]A. 5[ 

We need some notation. For any set S and scalar 7 , we let 'yS = {yx : x G S}. 


Theorem 3.7 (Delay of Modular Architectures) Fix some 7 G (0,1), and consider a Modu¬ 
lar architecture gn for each n. There exists a constant c > 0, independent of n and the sequence 
{gn}, so that 

E(1F| A„) <exp(-c-d„), (9) 

for every A„ G 7R(fl'„). 

3.3.1. Expanded Modular Architectures There is a further variant of the Modular archi¬ 
tecture that we call the Expanded Modular architecture, which combines the features of a Modular 
architecture and an expander graph via a graph product. By construction, it uses part of the sys¬ 
tem flexibility to achieve a large capacity region and part to achieve low delay. As a result, the 
Expanded Modular architecture admits a smaller capacity region compared to that of an Expander 
architecture. Another drawback is that the available performance guarantees involve policies that 
require the knowledge of the arrival rates A^. On the positive side, it guarantees an asymptotically 
vanishing delay for all arrival rates, uniformly across the capacity region, and can be operated by 
a scheduling policy that is arguably simpler than in the Expander architecture. The construction 
and a scheduling policy for the Expanded Modular architecture is given in Appendix]^ along with 
a statement of its performance guarantees (Theorem |B.1[ ). The technical details can be found in 

m- 


® We are using here the following elementary Lemma. Let A be an event with P(T) > 1 — e, and let X be a random 
variable. Then, there exists a set B with P(i3) > 1 — ,/e such that ¥{A \ X) > 1 — ,/e, whenever X £ B. The lemma is 
applied by letting A be the event {A„ G R(Gn)} and letting X = G„. 
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4. Analysis of the Expander Architecture 


In this section, we introduce a policy for the Expander architecture, based on batching and virtual 


queues, which will then be used to prove Theorem 3.4 We begin by describing the basic idea at a 
high level. 


4.1. The Main Idea 

Our policy proceeds by collecting a fair number of arriving jobs to form batches. Batches are 
thought of as being stored in a virtual queue, with each batch treated as a single entity. By choosing 
the batch size large enough, one expects to see certain statistical regularities that can be exploited 
in order to efhciently handle the jobs within a batch. We now provide an outline of the operation 
of the policy, for a special case. 

Let us fix n and consider the case where = A < 1 for all i. Suppose that at time t, all servers are 
busy serving some job. Let us also fix some 7 „ such that 7 „ <C 1, while is large. During the time 
interval + 7 „), “roughly” Any^ new jobs will arrive and njn servers will become available. Let 
T be the set of queues that received any job and let A be the set of servers that became available 
during this interval. Since An 7 „ <C n, these incoming jobs are likely to be spread out across different 
queues, so that most queues receive at most one job. Assuming that this is indeed the case, we 
focus on 5„|ruA) that is, the connectivity graph Qn, restricted to TU A. The key observation is that 
this is a subgraph sampled uniformly at random among all subgraphs of with approximately 
An 7 „ left nodes and n 7 „ right nodes. When re 7 „ is sufficiently large, and is well connected (as in 
an expander with appropriate expansion properties), we expect that, with high probability, 5„|ruA 
admits a matching that includes the entire set T (i.e., a one-to-one mapping from T to A). In this 
case, we can ensure that all of the roughly Xri'jn jobs can start receiving service at the end of the 
interval, by assigning them to the available servers in A according to this particular matching. 
Note that the resulting queueing delay will be comparable to 7 „, which has been assumed to be 
small. 

The above described scenario corresponds to the normal course of events. However, with a small 
probability, the above scenario may not materialize, due to statistical fluctuations, such as: 

1 . Arrivals may be concentrated on a small number of queues. 

2. The servers that become available may be located in a subset of §„ that is not well connected 
to the queues with arrivals. 

In such cases, it may be impossible to assign the jobs in T to servers in A. These exceptional 
cases will be handled by the policy in a different manner. However, if we can guarantee that the 
probability of such cases is low, we can then argue that their impact on performance is negligible. 
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Whether or not the above mentioned exceptions will have low probability of occurring depends 
on whether the underlying connectivity graph, gn, has the following property: with high probability, 
a randomly sampled sublinear (but still sufficiently large) subgraph of admits a large set of 
“flows.” This property will be used to guarantee that, with high probability, the jobs in T can 
indeed be assigned to distinct servers in the set A. We will show that an expander graph with 
appropriate expansion does possess this property. 


4.2. An additional assumption 

Before proceeding, we introduce an additional assumption on the arrival rates, which will remain 
in effect throughout this section, and which will simplify some of the arguments. Appendix A.6 
explains why this assumption can be made without loss of generality. 


Assumption 4.1 (Lower Bound on the Total Arrival Rate) We have that p G (1/2, 1), and 
the total arrival rate satisfies the lower bound 


>(l-p)n. (10) 


4.3. The Policy 

We now describe in detail the scheduling policy. Besides n, the scheduling policy uses the following 
inputs: 


1. p, the traffic intensity introduced in Condition 2.1 in Section 2.2 

2. e, a positive constant such that p + e < 1. 

3. a batch size parameter, 

4. the connectivity graph. 

Notice that the arrival rates, A^, and the fluctuation parameter, are not inputs to the scheduling 
policy. 

At this point it is useful to make a clarification regarding the < notation. Recall that the relation 
f{n) < g{n) means that f{n) < cg{n), for all n, where c is a positive constant. Whenever we use 
this notation, we require that the constant c cannot depend on any parameters other than p and 
e. Because we view p and e as fixed throughout, this makes c an absolute constant. 


4.3.1. Arrivals of Batches. Arriving jobs are organized in batches of cardinality pbn, where 
bn is a design parameter, to be specified laterj^Let Tfi = 0. For fe > 1, let Tfi be the time of the 
{kpbn)th. arrival to the system, which we also view as the arrival time of the kth batch. For k>l, 


^ In a slight departure from the earlier informal description, we define batches by keeping track of the number of 
arriving jobs as opposed to keeping track of time. 
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the kth batch consists of the pb^ jobs that arrive during the time interval The length 

Ak = — T^_i of this interval will be called the kth inter-arrival time. We record, in the next 

lemma, some immediate statistical properties of the batch inter-arrival times. 


Lemma 4.2 The batch inter-arrival times, are i.i.d., with 

P b^ 


-<E{Ak)<^ 
n 1 — p n 


and Var (Afe) ^b^jn^. 


Proof. The batch inter-arrival times are i.i.d., due to our independence assumptions on the job 
arrivals. By definition, Ak is equal in distribution to the time until a Poisson process records pbn 
arrivals. This Poisson process has rate r = using also Assumption 

inequality below, we have 

n 

(1 -p)n<'^ \i= r < pn, 


4.1 


in the first 


i=l 


The random variables A^ are Erlang (sum of pbn exponentials with rate r). Therefore, 

E{Ak) = pbn ■->pbn- — = —. 

r pn n 


Similarly, 

E(A)= pbn-^< pbn ■ f. ^ ■ 

r [I — p)n 

Finally, 

Var (Afc) = pbn-\< pbn ■ yr ^ ^ 

[1 — p)^n^ 

Q.E.D. 


4.3.2. The Virtual Queue Upon arrival, batches are placed in what we refer to as a virtual 
queue. The virtual queue is a GI/G/1 queue, which is operated in FIFO fashion. That is, a batch 
waits in queue until all previous batches are served, and then starts being served by a virtual 
queueing system. The service of a batch by the virtual queueing system lasts until a certain time 
by which all jobs in the batch have already been assigned to, and have started receiving service 
from, one of the physical servers, at which point the service of the batch is completed and the 
batch departs from the virtual queue. The time elapsed from the initiation of service of batch until 
its departure is called the service time of the batch. As a consequence, the queueing delay of a job 
in the actual (physical) system is bounded above by the sum of: 

(a) the time from the arrival of the job until the arrival time of the batch that the job belongs 
to; 

(b) the time that the batch waits in the virtnal quene; 

(c) the service time of the batch. 
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Service slots. The service of the batches at the virtual queue is organized along consecutive 
time intervals that we refer to as service slots. The service slots are intervals of the form (Is, (Z + l)s], 
where I is a nonnegative integer, whose length i^ 

s = {p + €)-—. 

n 

We will arrange matters so that batches can complete service and depart only at the end of a 
service slot, that is, at times of the form Is. Furthermore, we assume that the physical servers are 
operated as follows. If either a batch completes service at time Is or if there are no batches present 
at the virtual queue at that time, we assign to every idle server a dummy job whose duration is 
an independent exponential random variable, with mean 1. This ensures that the state of the n 
servers is the same (all of them are busy) at certain special times, thus facilitating further analysis, 
albeit at the cost of some inefficiency. 

4.3.3. The Service Time of a Batch. The specification of the service time of a batch 
depends on whether the batch, upon arrival, finds an empty or nonempty virtual queue. 


batch assignmen t batch 


succeeds 


remains 





assign jobs 

batch assignment 

fails 

clear the current 

in a new batch 
to idle servers 

^ batch 

batch greedily 


departs 



Figure 4 An illustration of the service slot dynamics. An arrow indicates the transition from the end of one 
service slot to the next. 


Suppose that a batch arrives during the service slot {Is, {I + l)s] and finds an empty virtual 
queue; that is, all previous batches have departed by time Is. According to what was mentioned 
earlier, at time Is, all physical servers are busy, serving either real or dummy jobs. Up until the end 
of the service slot, any server that completes service is not assigned a new (real or dummy) job, 
and remains idle, available to be assigned a job at the very end of the service slot. Let A be the set 
of servers that are idle at time {I + l)s, the end of the service slot. At that time, we focus on the 
jobs in the batch under consideration. We wish to assign each job i in this batch to a distinct server 
j G A, subject to the constraint that {i,j) G E. We shall refer to such a job-to-server assignment 
as a batch assignment. There are two possibilities (cf. Figure]^: 

® To see how the length of the service slot was chosen, recall that the size of each batch is equal to pbn. The length 
of the service slot hence ensures that {p + e)b„, the expected number of servers that will become available (and can 
therefore be assigned to jobs) during a single service slot, is greater than the size of a batch, so that there is hope of 
assigning all of these jobs to available servers within a single service slot. At the same time, since p + e < 1, service 
slots are shorter than the expected batch inter-arrival time, which is needed for the stability of the virtual queue. 
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(a) If a batch assignment can be found, each job in the batch is assigned to a server according 
to that assignment, and the batch departs at time {I + l)s. In this case, we say that the service 
time of the batch was short. 

(b) If a batch assignment cannot be found, we start assigning the jobs in the batch to physical 
servers in some arbitrary greedy manner: Whenever a server j becomes available, we assign to it a 
job from the batch under consideration, and from some queue i with (i, j) G E, as long as such a 
job exists. (Ties are broken arbitrarily.) As long as every queue is connected to at least one server, 
all jobs in the associated batch will be eventually assigned. The last of the jobs in the batch gets 
assigned during a subsequent service interval {I's, {I' + l)s], where I' > I, and we define (Z' + l)s as 
the departure time of the batch. 

If the /cth batch did indeed find an empty virtual queue upon arrival, its service time, denoted by 
Sk, is the time elapsed from its arrival until its departure. 

Suppose now that a batch arrives during a service slot {Is, {I + l)s] and finds a non-empty virtual 
queue; that is, there are one or more batches that arrived earlier and which have not departed by 
time Is. In this case, the batch waits in the virtual queue until some time of the form I's, with 
I' > I, when the last of the previous batches departs. Recall that, as specified earlier, at time I's 
all servers are made to be busy (perhaps, by giving them dummy jobs) and we are faced with a 
situation identical to the one considered in the previous case, as if the batch under consideration 
just arrived at time I's; in particular, the same service policy can be applied. For this case, where 
the fcth batch arrives to find a non-empty virtual queue, its service time, Sk, extends from the time 
of the departure of the (Zc — l)st batch until the departure of the kth batch. 

4.4. Bounding the Virtual Queue by a GI/GI/1 Queue 

Having defined the inter-arrival and service times of the batches, the virtual queue is a fully 
specified, work-conserving, FIFO single-server queueing system. 

We note however one complication. The service times of the different batches are dependent on 
the arrival times. To see this, suppose, for example, that a batch upon arrival sees an empty virtual 
queue and that its service time is “short.” Then, its service time will be equal to the remaining 
time until the end of the current service slot, and therefore dependent on the batch’s arrival time. 
Furthermore, the service times of different batches are dependent: if the service time of the previous 
batch happens to be too long, then the next batch is likely to see upon arrival a non-empty virtual 
queue, which then implies that its own service time will be an integer multiple of s. 

In order to get around these complications, and to be able to use results on GI/GI/I queues, we 
define the modified service time, S'/., of the fcth service batch to be equal to Sk, rounded above to 
the nearest integer multiple of s: 

S'j. = min{Zs : Is > Sk, 1 = 1,2 ,...}. 
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Clearly, we have Sk< S'/.. 

We now consider a modified (but again FIFO and work-conserving) virtual queueing system in 
which the arrival times are the same as before, but the service times are the S^. A simple coupling 
argument, based on Findley’s recursion, shows that for every sample path, the time that the batch 
spends waiting in the queue of the original virtual queueing system is less than or equal to the 
time spent waiting in the queue of the modified virtual queueing system. It therefore suffices to 
upper bound the expected time spent in the queue of the modified virtual queueing system. 

We now argue that the modified virtual queueing system is a GI/GI/1 queue, i.e., that the 
service times are i.i.d., and independent from the arrival process. For a batch whose service 
starts during the service slot [Is, {I + l)s), the modified service time is equal to s, whenever the 
batch service time is short. Whether the batch service time will be short or not is determined by 
the composition of the jobs in this batch and by the identities of the servers who complete service 
during the service slot [Is, {I + l)s). Because the servers start at the same “state” (all busy) at each 
service slot, it follows that the events that determine whether a batch service time will be short or 
not are independent across batches, and with the same associated probabilities. 

Similarly, if a batch service time is not short, the additional time to serve the jobs in the batch 
is affected only by the composition of jobs in the batch and the service completions at the physical 
servers after time Is, and these are again independent from the inter-arrival times and the modified 
service times of other batches m. Finally, the same considerations show the independence of 
the S'), from the batch arrival process. 

It should now be clear from the above discussion that the modified service time of a batch is of 
the form 

S'k = sXf^ ■ Sk, ( 11 ) 

where: 

(a) Xk is a Bernoulli random variable which is equal to 1 if and only if the fcth batch service 
time is not short, i.e., it takes more than a single service slot; 

(b) Sk is a random variable which (assuming that every queue is connected to at least one server) 
is stochastically dominated by the sum of pbn independent exponential random variables with mean 
1, rounded up to the nearest multiple of s. (This dominating random variable corresponds to the 
extreme case where all of the pbn jobs in the batch are to be served in sequence, by the same 
physical server.) 

(c) The pairs {Xk,Sk) are i.i.d. 
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4.5. Bounds on the Modified Service Times 

For the remainder of Section we will assume that 

is a ( 7 // 3 „,/ 3 „)-expander, ( 12 ) 


where 7 and are defined as in the statement of Theorem 3.4 


The main idea behind the rest of the proof is as follows. We will upper bound the expected time 
spent in the modified virtual queueing system using Kingman’s bound [IB] for GI/GI /1 queues. 
Indeed, the combination of a batching policy with Kingman’s bound is a fairly standard technique 
for deriving delay upper bounds (see, e.g., |25j). We already have bounds on the mean and variance 
of the inter-arrival times. In order to apply Kingman’s bound, it remains to obtain bounds on the 
mean and variance of the service times S'(, of the modified virtual queueing system. 

We now introduce an important quantity associated with a graph by defining 


q{gn) = ^{Xk = i\gn)] 


because of the i.i.d. properties of the batch service times, this quantity does not depend on k. In 
words, for a given connectivity graph gn, the quantity q{gn) stands for the probability that we 
cannot find a batch assignment, between the jobs in a batch and the servers that become idle 
during a period of length s. 

We begin with the following lemma, which provides bounds on the mean and variance of S'/,. 


Lemma 4.3 There exists a sequence, {c„}„gN, with c„ < 6„, such that for all n>l 

s <E {Si \gn)<s + q{gu)cn, 

and 

Var(5; |5„)<g(5f„)c^. 

Proof. The fact that E (5(, | > s follows from the definition of S'f, in Eq. 0 and the non¬ 

negativity of Xj^Sk- The definition of an expander ensures that every queue is connected to at least 
one server through g„. Recall that Sk is zero if = 0; on the other hand, if = 1, and as long as 
every queue is connected to some server, then Sk is upper bounded by the sum of pbn exponential 
random variables with mean 1, rounded up to an integer multiple of s. Therefore, 

E(5(, I g„) = s-f E (^XkSk \ gr^ = s-f E(Xfc = 11 • E (^5*, | < s + q{gn){phn + s), 

which leads to the first bound in the statement of the lemma, with c„ = -|- s. Since s is proportional 

to 6„/n, we also have Cn^b„, as claimed. Furthermore, 

Var(S'fc I g„) = Var (^XkSk \ <E (^X^S^ | < q{gn){bn + s)'^ = q{gn)cl. 
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Q.E.D. 

We now need to obtain bounds on q{gn)- This is nontrivial and forms the core of the proof 
of the theorem. In what follows, we will show that with appropriate assumptions on the various 
parameters, and for any A G A(u„), an Erdos-Renyi random graph has a very small q{gn), with 
high probability. 


4.6. Assumptions on the Various Parameters 

Erom now on, we focus on a specific batch size parameter of the form 

320 nlnn 




We shall also set 


{l-pY Yn 

l-p 


e = 


We assume, as in the statement of Theorem 3.4, that dn <C n, and that 


;dn >dn>lnn. 


(13) 


(14) 


(15) 


Under these choices of bn and dn, we have 

bn< 


n 


<C n; 


(16) 


dn/ Inn 

that is, the batch size is vanishingly small compared to n. Einally, we will only consider arrival rate 


vectors that belong to the set A„(n„) (cf. Condition 2.1), where, as in the statement of Theorem 

^-p, 


Un < 


-Pn- 


(17) 


4.7. The Probability of a Short Batch Service Time 

We now come to the core of the proof, aiming to show that if the connectivity graph gn is an 
expander graph with a sufficiently large expansion factor, then q{gn) is small. More precisely, we 
aim to show that a typical batch will have high probability of having a short service time. A 
concrete statement is given in the result that follows, and the rest of this subsection will be devoted 
to its proof. 


Proposition 4.4 Fix n > 1. We have that 

q{gn)<\- (18) 

Let us focus on a particular batch, and let us examine what it takes for its service time to be 
short. There are two sources of randomness: 
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1. A total of pbn jobs arrive to the queues. Let Ai be the number of jobs that arrive at the ith 
queue, let A = (Ai,..., A„), and let L be the set of queues that receive at least one job. In 
particular, we have 

n 

i—l 

2. During the time slot at which the service of the batch starts, each server starts busy (with 
a real or dummy job). With some probability, and independently from other servers or from 
the arrival process, a server becomes idle by the end of the service time slot. Let A be the set 
of servers that become idle. 

Recalling the definition of as the indicator random variable of the event that the service time of 
the kth batch is not short, we see that is completely determined by the graph together with 
A and A. For the remainder of this subsection, we suppress the subscript k, since we are focusing 
on a particular batch. We therefore have a dependence of the form 


A = /(5„,A,A), 


for some function /, and we emphasize the fact that A and A are independent. 


Recall that e = (1 — p)/2, and from the statement of Theorem 3.4 that 

1 1 

P = 


l + {l-p)/8 l + e/4' 


(19) 


Clearly, p < 1, and with some elementary algebra, it is not difficult to show that, for any given 
pG(0,l), 

p> p. 


Let 


so that 





puin = pbn 


( 20 ) 


Finally, let 

a mn 

Un=Pn -• 

n 

We will say that A is nice if there exists a set F D F of cardinality , such that = 0 whenever 
i^T, and 

A,<Un, Vief. 


We now establish that A is nice, with high probability. The main idea is simple: A is not nice 
only if one out of a finite collection of binomial variables with large means takes a value which is 
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away from its mean by a certain multiplicative factor. Using the Chernoff bound, this probability 
can be shown to decay at least as fast as 1 /n^. The details of this argument are given in the proof 
of Lemma 4.5 in Appendix |A.7[ 


Lemma 4.5 For all sufficiently large n, we have that 

P(A is not nice) < 

77,'^ 

We now wish to establish that when A is nice, there is high probability (with respect to A), that 
the batch service time will be short. Having a short batch service time is, by definition, equivalent 
to the existence of a batch assignment, which in turn is equivalent to the existence of a certain flow 
in a subgraph of §„ ■ The lemma that follows deals with the latter existence problem for the original 
graph, but will be later applied to subgraphs. Let R( 5 ) be the closure of the capacity region, R(s'), 
of g. 


Lemma 4.6 Fix n,n' G N, p £ (0,1), and 7 > p. Suppose that an n x n' bipartite graph, p„, is a 
ffi/13nffin)-expander, where /3n>Ur,. T/ien A„(u„) C R(p„). 


Proof. The claim follows directly from Lemma 3.2, by noting that ^{gn) P> R(5. 


Q.E.D. 


The next lemma is the key technical result of this subsection. It states that if is an expander, 
then, for any given T, the random subgraph PnlfuA t>e an expander graph with high probability 
(with respect to A). The lemma is stated as a stand-alone result, though we will use a notation 
that is consistent with the rest of the section. The proof relies on a delicate application of the 
Chernoff bound, and is given in Appendix] A. 8[ 


Lemma 4.7 Fix n > 1, 7 G (0,1), and p G [1/2,1). Let = (/U J,E) he an nxn bipartite graph 
that is a / ffi, ffi)-expander, where Define the following quantities: 

l-p 


1 

^ l-l-e/4’ 

320 nlnn 80 nlnn 

-Ph 

p 

^ _ o /Ol ^ 

Un —Pn • 

n 

Let r be an arbitrary subset of the left vertices, L, such that 

|f| =m„. 


(22) 
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and let A be a random subset of the right vertices, J, where each vertex belongs to A independently 
and with the same probability, where 

P(jGA)>(p + 3e/4)^, VjGJ, (23) 

n 

for all n sufficiently large. Denote by G the random subgraph 5„|fuA- Then 

¥ (^G is not a {'y/un ,Un)-expander^ < —, (24) 

for all n sufficiently large, where the probability is measured with respect to the randomness in A. 


To invoke Lemma 4.7 note that the conditions in Eq. (21) are identical to the definitions for the 


corresponding quantities in this section. We next verify that Eq. (23) is satisfied by the random 
subset, A, consisting of the idle servers at the end of a service slot. Recall that the length of a 
service slot is ^ {p + e), and hence the probability that a given server, j, becomes idle by the end 
of a service slot is 


(j G A) = 1 - exp —(p + e) ~ (p + e) —, 


n 


n 


(25) 


oo. Therefore, for all n sufficiently large, we have that P(j G A) >(/) + 3e/4)^. We will 

to the random subgraph with left (respectively, right) nodes T 


as n 

now apply Lemmas 


4.6 


and 


4.7 


(respectively A), and with the demands Ai, for i G f, playing the role of A. 


Lemma 4.8 If n is large enough, and if the value a of A is nice, then 

P(A = 1| A = a)<4, 
where the probability is with respect to the randomness in A. 

Proof. We fix some a, assumed to be nice. Recall that 


and from the statement of Theorem 13.41 that 


’=\rp>, 


We apply Lemma 4.6 to the randomly sampled subgraph G, with left nodes T, |r| = m„, and right 


nodes A. We have the following correspondence: the parameters n and p, in Lemma 4.6 become, in 
the current context, and p, respectively, and the parameters /3„ and both become u„. Thus, 


by Lemma 4.6 


if G is a (y/un,u„)-expander, then Am„{u„) C R(G). 


( 26 ) 
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Let A be the vector of job arrival numbers A, restricted to the set of nodes in L, and let a be 
the realization of A. Note that we have 


= =pb„ = pm„, 


i&r 


because of Eq. (20). Furthermore, for any z G L, the fact that a is nice implies that Ui <Un- Thus, 


aG A„i„(h„). By Eq. (26), this further implies that 

if G is a ( 7 /-u„,ti„)-expander, then aGR(G). (27) 

the graph G is a (y/un, u„)-expander with probability at least l — Combining 


By Lemma 


4.7 


this fact with Eq. (27), we have thus verified that a belongs to R(G), with probability at least 

1 


With R(G) having been defined as the closure of the capacity region, R(G) (cf. Definition 2.2), 
the fact that the vector a belongs to R(G) is a statement about the existence of a feasible flow, 
{fij • (l j) £ (where E is the set of edges in G), in a linear network flow model of the form 


hi — ^ ^ fij 1 

j-.{i,j)^E 

i:{i,j)^E 


hj > 0 , 


Vz GL, 
V J G A, 


V(f,j)GE. 


Because the “supplies” hi in this network flow model, as well as the unit capacities of the right nodes 
are integer, it is well known that there also exists an integer flow. That is, we can find fij G {0,1} 
such that Yhj fij = hj, for all z, and hj < 1, for all j. But this is the same as the statement that 
there exists a feasible batch assignment over G. Thus, for large enough n and for any given nice 
a, the conditional probability that a batch assignment does not exist is upper bounded by n“^, as 
claimed. Q.E.D. 

We can now complete the proof of Proposition |4.4[ By considering unconditional probabilities 
where A is random, and for n large enough, we have that 


P(A = 1) <P(A is not nice) + P(A = 1 | A = a) • P(A = a) 

a nice 

(“) 1 

< — + IP(^ = 1 I A = a) • P(A = a) 

a nice 

1 

^ in 3 


(b) 1 
< — 


= a 


2 

< — 
rz^ 


rz^ 


(28) 
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where steps (a) and (6) follow from Lemmas 4.5 and 4.8 respectively. This concludes the proof of 
Proposition |4.4[ 

4.8. Service and Waiting Time Bounds for the Virtual Queue 


4.8.1. Service Time Bounds. We will now use Lemma 4.3 and Proposition 4.4 to bound 


the mean and variance of the service times in the modified virtual queue. 

Lemma 4.9 The modified batch service times, S'f., are i.i.d., with 

'^{S'k\ 9 n)^{p + e)-—, and Var (5^ | £?„) < ^. 
n 

Proof. We use the fact from Lemma |4.3[ that s < E {S'f. \ gn) < s + q{gn)cn, where c„ < bn- We 
recall that s = {p + e)bn/n, and use the fact q{gn)<n~'^, as guaranteed by Proposition |4.4[ The term 
q{gn)cn satisfies q{gn)cn < bnjrP, which is of lower order than 6„/n, and hence negligible compared 
to s. This proves the first part of the lemma. 

For the second part, we use Lemma 4.3 in the first inequality below, and the fact that q{gn)^n~‘^ 
in the second, to obtain 

Var {S'f, I gn) < q{gn)cl < %. 

Q.E.D. 

4.8.2. Waiting Time Bonnds. Fix n and the graph g^.- Let be a random variable whose 
distribution is the same as the steady-state distribution of the time that a batch spends waiting in 


the queue of the virtual queueing system introduced in Section 4.3.2 


Proposition 4.10 We have that 


^{W^\gn)<-. 


n 


(29) 


Proof. As discussed in Section 4.4, the waiting time of a batch, in the virtual queueing system, is 
dominated by the waiting time in a modified virtual queueing system, which is a GI/GI/1 queue. 


with independent inter-arrival times Af, (defined in Section 4.3.1) and independent service times 
S',,. Let W' be a random variable whose distribution is the same as the steady-state distribution of 
the time that a batch spends waiting in the queue of the modified virtual queueing system. 
According to Kingman’s bound [TH] , W' satisfies 

E(vri9„)<A|i±|. 

where A is the arrival rate, p is the traffic intensity, and and are the variances of the inter¬ 
arrival times and service times, respectively, that are associated with the modified virtual queueing 
system. 
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From Lemma 4.2, we have 


and 


We now bound 


A= —<- 

IE {Ak) bn 


u^ = Var(A)<^. 


P = 


nsihA 

E(A) ■ 


From the hrst part of Lemma 4.9, we have E( 5 ^| 5 („) ~ {p + e)bn/n. Together with the bound 


1/E (Ak) <nlbn, we obtain that as re —)• oo, p is upper bounded by a number strictly less than 1. 


We also have, from the second part of Lemma 4.9 


a2 = Var(5/|5„)<^. 


re^ 


Using these inequalities in Kingman’s bound, we obtain 




Q.E.D. 


4.9. Completing the Proof of Theorem |3.4| 


Proof. As discussed in Section 4.3.2 the expected waiting time of a job is upper bounded by the 
sum of three quantities. 

(a) The expected time from the arrival of the job until the arrival time of the batch that the job 
belongs to. This is bounded above by the expected time until there are pbn subsequent arrivals, 
which is equal to E (Ai). By Lemma 4.2, this is bounded above by Ci 6 „/re, for some constant Ci. 


(b) The expected time that the batch waits in the virtual queue. This is also upper bounded by 


C 2 bnfn, by Proposition 4.10, for some constant C 2 . 


(c) The service time of the batch, which (by Lemma 4.9) again admits an upper bound of the 
form Csbnfn, for some constant C 3 . 

Furthermore, in the results that give these upper bounds, Ci, C 2 , and C 3 , are absolute constants, 
that do not depend on A„ or p„. 

and Pn is 


By our assumptions on the choice of bn in Section 


4.6 


we have = 


320 

'P-pP 


proportional to d„. We conclude that there exists a constant c such that for large enough re, we 
have E iW \ p„, A„) < clnre/(i„, for any given A„ G A„(re„), which is an upper bound of the desired 
form. This establishes Part 1 of the theorem. Finally, Part 2 follows from the way that the policy 
was constructed. Q.E.D. 
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Figure 5 



Simulations of the virtual-queue based policy given in Section 4.3 with d„ = ,bn = n ln(n)/d„, and 


\i = 0.5 for all i = 1,..., n. The boxplot contains the average delay from 50 runs of simulations where 
the job size distribution is assumed to be exponential with mean 1. Each run is performed on a random 
d„-regular graph over 10^ service slots and a 1000-slot burn-in period. The center line of a box represents 
the median and upper and lower edges of the box represent the 25th and 75th percentiles, respectively. 
The dashed line depicts the median average waiting times when the job sizes are distributed according 
to a log-normal distribution with mean 1 and variance 10. 


4.10. On Practical Policies 

Figure [^provides simulation results for the average delay under the virtual-queue based scheduling 


policy used in proving Theorem 3.4 The main role of the policy is to demonstrate the fundamental 
potential of the Expander architecture in jointly achieving a small delay and large capacity region 
when the system size is large. In smaller systems, however, there could be other policies that 
yield better performance. For instance, simulations suggest that a seemingly naive greedy heuristic 
can achieve a smaller delay in moderately-sized systems, which is practically zero in the range of 
parameters in Figure Under the greedy heuristic, an available server simply fetches a job from 
a longest connected queue, and a job is immediately sent to a connected idle server upon arrival 
if possible. Intuitively, the greedy policy can provide a better delay because it avoids the overhead 
of holding jobs in queues while forming a batch. Unfortunately, it appears challenging to establish 
rigorous delay or capacity guarantees for the greedy heuristic and other similar policies. 

In some applications, such as call centers, the service times or job sizes may not be exponentially 
distributed (ID)- In Figure we also include the scenario where the job sizes are drawn from a 
log-normal distribution ([3]) with an increased variance. Interestingly, the average delay appears 
to be somewhat insensitive to the change in job size distribution. 
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5. Summary and Future Research 

The main message of this paper is that the two objectives of a large capacity region and an asymp¬ 
totically vanishing delay can be simultaneously achieved even if the level of processing flexibility 
of each server is small compared to the system size. Our main results show that, as far as these 
objectives are concerned, the family of Expander architectures is essentially optimal: it admits a 
capacity region whose size is within a constant factor of the maximum possible, while ensuring an 
asymptotically vanishing queueing delay for all arrival rate vectors in the capacity region. 

An alternative design, the Random Modular architecture, guarantees small delays for “many” 
arrival rates, by means of a simple greedy scheduling policy. However, for any given Modular 
architecture, there are always many arrival rate vectors in A„(n„) that result in an unstable system, 
even if the maximum arrival rate across the queues is of constant order. Nevertheless, the simplicity 
of the Modular architectures can still be appealing in some practical settings. 

Our result for the Expander architecture leaves open three questions: 

1. Is it possible to lower the requirement on the average degree from dn^lnn to 3> 1? 

2. Without sacrificing the size of the capacity region, is it possible to achieve a queueing delay 
which approaches zero exponentially fast as a function of d„? The delay scaling in Theorem 


3.4 is 0{lnn/dn)- 


3. Is it possible to obtain delay and stability guarantees under simpler policies, such as the 


greedy heuristic mentioned in Section 4.10? The techniques developed in m for analyzing 
first-come-first-serve scheduling rules in a multi-class queueing network similar to ours could 
be a useful starting point. 

Finally, the scaling regime considered in this paper assumes that the traffic intensity is fixed 
as n increases, which fails to capture system performance in the heavy-traffic regime (p ~ 1). It 
would be interesting to consider a scaling regime in which p and n scale simultaneously (e.g., as in 
the celebrated Halfin-Whitt regime ffH), but it is unclear at this stage what the most appropriate 
formulations and analytical techniques are. 
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Appendix A: Proofs 
A.l. Proof of Lemma 13.21 

Proof. Fix A = (Ai,..., A„) G An{un), and let gn be a {'jf Pn, /3„)-expander, where 7 > p and /?„ > Un- 
By the max-flow-min-cut theorem, and the fact that all servers have unit capacity, it suffices to 
show that 

^A, <|AA(5)|, yScI. (30) 

ies 

We consider two cases, depending on the size of S. 

1. Suppose that |5| < 'yn/Pn- By the expansion property of g^, we have that 


AA(5)>^„|5|>n„|5|> J]A„ (31) 

ies 

where the second inequality follows from the fact that /3„ >Un, and the last inequality from 
\i<Un for all z G /. 

2 . Suppose that 151 > ^n/ By removing, if necessary, some of the nodes in S, we obtain a set 
S' C S of size exactly jn/Pn, and 

M (5) > J\f {S') > jn > pn > Ai, (32) 

ies 


where step (a) follows from the expansion property, and step (6) from the assumption that 
This completes the proof. Q.E.D. 


A.2. Proof of Lemma 13.31 


Proof. Lemma 3.3 is a consequence of the following standard result (cf. |T]), where we let d = 
P = Pn, and a = 'ffPn = y^/ Pn, and observe that log 2 Pn^ Pn as re —)> 00 . 


Lemma A.l Fix re > 1, P>1 and aP < 1. If 

j ^ 1 + log2 P + {P + l) log2 e , ^ , 
- -log2(«/3) ’ 

then there exists an {a, P)-expander with maximum degree d. 


(33) 


Q.E.D. 

A.3. Proof of Theorem 13.51 

Proof. Since the arrival rate vector A„ whose existence we want to show can depend on the 
architecture, we assume, without loss of generality, that servers and queues are clustered in the 
same manner: server i and queue i belong to the same cluster. Since all servers have capacity 1, and 
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each cluster has exactly (i„ servers, it suffices to show that there exists A = (Ai,..., A„) G 
such that the total arrival rate to the hrst queue cluster exceeds (i„, i.e., 

^X^>dn. (34) 

i^l 

To this end, consider the vector A where A^ = min{2, (1 + u„)/2} for all i G {1,..., d„}, and A^ = 0 
for i>dn + l. Because of the assumption > 1 in the statement of the theorem, we have that 


max Ai = min{2, (1 + u„)/2} < 

l<'i<n 


1 + 
2 


Un 


(35) 


and 


Ai = (i„min{2, (1 + 'u„)/2} < 2d„ < 2 • ^n = pn, 


Z=1 


(36) 


where the last inequality in Eq. (36) follows from the assumption that dn < fu. Eqs. (35) and (36) 
together ensure that A G A„(n„) (cf. Condition 1). Since we have assumed that > 1, we have 


Ai > 1, for i = 1,..., dn, and therefore Eq. (34) holds for this A. We thus have that A ^ R( 5 '„), which 
proves our claim. Q.E.D. 

A.4. Proof of Theorem 13.61 

Proof. Part (a); Eq. ([^. We will use the following classical result due to Hoeffding, adapted from 
Theorem 3 in m- 


Lemma A.2 Fix integers m and n, where 0 < m < n. Let Xi,X 2 ,. ■ ■ ,Xm be random variables 
drawn uniformly from a finite set C = {ci,..., c„}, without replacement. Suppose that 0<Ci<b for 
all i, and let = Var {Xi). Let A = T ^ Xi. Then, 


F{X>E{X) +t) < exp 




In 




(37) 


for all t G (0,6). 


We hx some A„ G A„(u„). If Un < 1, then A„ G A„(l). It therefore suffices to prove the result for 
the case where > 1 and we will henceforth assume that this is the case. Recall that C / is 
the set of queues in the kth queue cluster generated by the partition (T„ = (Ai,..., An/d „) • We 
consider some e G (0,1/p), and define the event as 


Eh — 


| e ^. 


> (l + e)pd„ 


(38) 


Since ct„ is drawn uniformly at random from all possible partitions, it is not difficult to see that 
lias the same distribution as where Ai, A 2 ,..., are dn random variables 

drawn uniformly at random, without replacement, from the set {Ai, A 2 ,..., A„}. Note that ep < 1 < 
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Un, SO that ep G (0,m„). We can therefore apply Lemma A.2 with m = b = Un, and t = ep, to 
obtain 

/ d„ \ 

P(L;i)=P J]W>(l + e)pdJ 

Var(X 


\d 

<exp ( — 


i=l 

epdr, 

Un 


epur, 


epun V + VarVi) ' ^ 


(39) 


where the probability is taken with respect to the randomness in G, and where in step (a) we used 
the fact that 

^ ^ hji ! ^ n 

“ “ “ (40) 




. 2=1 


2=1 


2=1 


We now develop an upper bound on Var (Xi). Since takes values in [0, Un ], we have X"l < UnXi 
and, therefore, 

Var (Ai) < E(A^) < u„E(Ai) < pu„. 


Observe that for all a, x > 0, 


-^(l + x/a)ln(l + a/x) = —- + -ln(l + a/x) <-- + -•- = 0. 
ax X a X a X 


(41) 


(42) 


Therefore, with the substitutions a = epUn and x = Var (Ai), we have that the right-hand-side of 
(39) is increasing in Var(Ai). Combining Eqs. (39) and (41), we obtain 

P(E^i) <exp ^1-F^^ ln(l-|-e)-l ^ . 

Note that 

^ fl +ln(l-l-x) = ^(x - ln(l-kx)) ^ asx|0, (43) 

dx \ X J x^ 2 

where step (a) follows from applying rHopital’s rule. We thus have that [(l + i) ln(l -k e) — l] ~ 
|e>|e, as e 10, it follows that there exists 0 > 0 such that for all e G (0,0), 


P(Ai) <exp -- 


p e^dr 


3 u^. 


(44) 


Let e = |min{4 — 1,0}; in particular, our earlier assumption that ep < 1 is satisfied. Suppose 

2—1 , , - -- 

that Un < ^d„ln n. Combining Eq. (44) with the union bound, we have that 


( n/dn \ 

u 

k=l ) 

n/dn 

fc=l 
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< —exp 
dn 

(“) n 1 

< - 

dn 

<n“^, 



Un J 


(45) 


where step (o) follows from the assumption that Un < Adn In ^ n. It follows that 


lim inf (A„ G R(G„)) > lim (l --) = 1. 

n—>00 A„GA„(-u„) n—>00 \ n / 


We have therefore proved part (a) of the theorem, with C 2 = pe^/G. 

Part (b); Eq. Q. 

Let us fix a large enough constant C 3 , whose value will be specified later, and let 


(46) 


(47) 


For this part of the proof, we will assume that Because we are interested in showing a 

result for the worst case over all A„ G A„(n„), we can assnme that <C n. 

At this point, we conld analyze the model for a worst-case choice of A„. However, the analysis 
turns out to be simpler if we employ the probabilistic method. Denote by a probability measure 
over A n{un)- Let A„ be a random vector drawn from the distribution /i„, independent of the 
randomness in the Random Modular architecture, G. (For convenience, we snppress the subscript n 
and write G instead of G„.) The following elementary fact captures the essence of the probabilistic 
method. 


Lemma A.3 Fix n, a measure fin on A„(u„), and a constant On- Suppose that 


*A„,G An ^ R(G)) > On, 


(48) 


where Pa„,g stands for the product of the measures fin (for A„j and Pg (for G). Then, 


snp Pg(A„ ^ R(G)) > 


(49) 


Proof. We have that 


snp Fg{K4R{G))> [ 

XnGAn{u>ti) J Xn&A.n{un) 

=IPa„.G An ^ R(G')) 

— On,. 


cAn ^ R(G)) dflnAn) 


(50) 


Q.E.D. 
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We will now construct sequences, : n G N}, and {a„ : n G N}, with lim„_,,oo «« = 1, so that 


Eq. (48) holds for all n. To simplify notation, in the rest of this proof we will write P instead of Pg 
or Pa„,G) 6 tc. Which particular measure we are dealing with will always be clear from the context. 

Fix n G N. We first construct the distribution /r„. Let A' = (A'l, A 2 ,..., A^) be a random vector 
with independent components and with 

W.p. 

* \ 0, otherwise, 

for all i. Let H be the event defined by 


(51) 


H=\^K<pn 


2=1 


Let Xn be the random vector given by 




(52) 


(53) 


where 0 is the zero vector of dimension n, and where I(-) is the indicator function. That is, A„ 
takes on the value of A' if H occurs, and is set to zero, otherwise. It is not difficult to verify that, 
by construction, we always have A„ G A„(m„). We let be the distribution of this random vector 


We next show that 


lim P(A„^R(G)) = 1, 


(54) 


which, together with Lemma A.3 above, will complete the proof of the theorem. Fix some e > ^ — 1, 


so that (1 + e)p > 1, and define the event 

Efc = < y] A' > (l + e)/9d„ > , A: G {1,... ,n/d„}. 




Note that, if some occurs, then A' will not be in M(G). Therefore, 


l/drt 


P(A'^R(G))>P U Ek 


, fc = l 


Let Ai, A 2 ,... be i.i.d. Bernoulli random variables with 

E(Xi)=P(Xi = l) = 


(1 + e)u„' 


By the definition of A' (cf. Eq. ([^), we have that 


P(Ei)=P J]A'>(l + e)pd 

\iGAi 
/ dn 


w > (1 +^)p 


dn 


, 2=1 


^|^W>(l + e)^E(V)V 


(55) 


(56) 


(57) 


( 58 ) 
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By Sanov’s theorem (cf. Chapter 12 of [S]), we have that 


P(£;i) =P > (1 + (Xi) 

(l + e)p 


>^exp( 


P 


{l + e)Vr. 


dr 


(59) 


where DB{p\\q) is the Kullback-Leibler divergence between two Bernoulli distributions with param¬ 
eters p and q, respectively: 

(60) 


Dsipllq) = pin - + {1 -p) ln(J— 

q 1-Q 


Let us fix some r G (0,1). Using the fact that ln(l -|- y) ~ y as y —)• 0, we have that 


Db (x|| rx) ~ X 


In - (1 — r) 

r 


as X —> 0. 


(61) 


Recall that > Ci Inn and > /Inn. By Eq. (61), with x = (1 -|- e)p/u„, r = 1/(1 -|- e)^, and for 
the given Ci, we can set C 3 to be sufficiently large so that 


Df 


(l + e)p 


{l + e)vr. 


<2 


(l + e)p 


ln(l + e)^+ 1- 


1 


(1 + e)^ 


_2h 

Vn ’ 

for all sufficiently large n, where h = (1 -|- e)p ln(l -|- e)^ “ {nhj^ 

dA 1 


(62) 


and (62), we have that 


IP(^i) ^ :^exp ( -2h— ) > -^n 


dl 


-2h/cz 


dl 


> 0. Combining Eqs. (59) 


(63) 


where step (a) follows from the assumption that > C 3 d„/lnn. Equation (63) can be rewritten in 
the form 


P(L^i) > 

where c is a positive constant, and where the inequality is valid for large enough n. 
Fix C 3 = 40/i, and recall that e > ^ — 1. We have that 

/n/dn \ 

P(A'^R(G))>P U eA 


(64) 


(a) 


V k=l 
n/d„ 


1 _ J|(l_p(^,)) 




=i-(i-p(Ei)y 


1 /dn 


(b) 


>1 - {1 - cd-^F-^^/^^drr/n) 


n/dn 


(c) 


> 1 — (1 — cnP'^^dn/n) 


1 j dn 


—as n ^ 00 


(65) 
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where step (a) is based on the independence among the events E^, which is in turn based on 


the independence among the A's; step ( 6 ) follows from Eq. (64) and some rearrangement; step (c) 
follows from the assumption in the statement of the theorem that dn < n°'^, and our choice of 
C 3 = 40/i. 

We next show that the event H occurs with high probability when n is large. Let, as before, the 
XiS be i.i.d. Bernoulli random variables with E(Xi) = ^ . Then, 


¥{H) =f[^K 


' < 


S pn 


^ W < pn/vn j 

< (1 + e)IE (Xi) I —1, asn—)-oo. 


, 2=1 

i 


2=1 


by the weak law of large numbers. 


( 66 ) 


We are now ready to prove Eq. (54). We have that 


IPa„,g (a„ i R(G)) =Pv,G (l(R) A' i R(G)) 

=Pv,G(Rn{A'^R(G)}) 

>P(R)+P(A'^R(G)) -1 

^1, as n —)■ 00 , (67) 


where the last step follows from Eqs. ( |65| ) and ( 66 ). By Lemma A.3, Eq. (67) 
implies that lim„_,,oo (A„ ^ R(G)) = 1, which is in turn equivalent to 

lim„_,oo infA„GA„(«„) Pg„ (A„ G R(G)) = 0. This proves Eq. ([^. Q.E.D. 

A.5. Proof of Theorem 13.71 


Proof. Denote by Qi{t) the number of jobs in queue i at time t, and by Qk{t) the total number of 
jobs in queue cluster k, i.e.. 


Qkit) ='^ ( 68 ) 

leAf. 

We note that Qfc(') is the number of jobs in an MjMfc queue, with c = dn and arrival rate 
rjk = Also note that since A„ G 7 R( 5 „), we have that pk < Using the formula for the 

expected waiting time in queue fov an MjMic queue (cf. Section 2.3 of [3]), one can show that the 
average waiting time across jobs arriving to cluster k, Wk, satisfies 


E {Wk\X) = ^ 


C{dn,Pk) 

dn Pk 


< 


C{dn,^dn) 

{l-Adn 


<ex.p{-b-dn), 


(69) 
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where C{c,r) is given by 


C{c,r) = — 


1 


1 


c! c(l —r/c)^ \ c! 1 — r/c 


C—1 A 

i=0 


The last inequality in Eq. (69) follows from the fact that for any given 7 G (0,1), there exists 
6 > 0, so that C{x,'yx) < exp (—6 ■ x) as x —)• 00 , as can be checked through elementary algebraic 
manipulations. Q.E.D. 

A.6. Lower Bound on the Total Arrival Rate 

We show in this section that the assumption that p G (1/2,1) and ^ (1 “ (cf- Ed- 

in Assumption |4.1[ ) can be made without loss of generality. Fix the traffic intensity p G (0,1), and 
suppose that A G A„(u„). Define 


/ 1 / \ 1 H“ p 

p=p+2(i-rt = ^- 


(70) 


Note that 1/2 < p' < 1 , and 1 —p' = (1 — p)/ 2 . Consider a modified vector A', where A' = (1 —p') + Ai, 
for alH G {1,..., n}. By construction, we have that 


^A'>(l-p')n, 


2 = 1 


^ A' <(1 - p')n + ^ A, < (1 - p')n + pn = p'n, 
2 = 1 2 = 1 

max A' < max A^ + (1 — p')<Un + (1 — p')- 


i<i< 


i<i< 


(71) 

(72) 

(73) 


The above definition of A' amounts to the following: we feed each queue with an additional indepen¬ 
dent Poisson stream of artificial (dummy) jobs of rate 1 — p'. By Eqs. ( [7^ and ( [7^ , the resulting 
arrival rate vector. A', will belong to the set A„(u„ -f 1 — p'). Also, by Eq. 0. it will satisfy the 
lower bound (10) on the total arrival rate, albeit with a modified traffic intensity of p' G (1/2, 1). 
Therefore, our assumption can always be satisfied by the insertion of dummy jobs. Note that the 
increment of 1 — p' to the value of is insignificant in our regime of interest, where 1 , and 
the insertion of dummy jobs only requires knowledge of the original traffic intensity, p. 

A.7. Proof of Lemma 14.51 

Proof. Note that because there are pbn jobs in a batch, the size of T is at most p6„, which is in 
turn less than m„. This guarantees that the cardinality of T can be taken to be m„. It therefore 
suffices to show that 

P f max Ai>Un] < 1/re^. (74) 

yi<i<n J 

There is a total of pbn arriving jobs in a single batch, and for each arriving job 

A,; (“) A, 


P (the job arrives to queue i) = 


e:=iA. 


V W I I 

7 . ^ <(—/?„, (75) 

(1 —p)n (1 —p)re 2 n Znp 
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for all i, where steps (a) and (b) follow from the assumptions that ^^=1 ^ (1 “ (Eq. 


in Assumption 4.1) and that Un < (in the statement of Theorem 3.4), respectively. From 


Eq. (75), Ai is stochastically dominated by a binomial random variable A = Bino(yo6„ , ^/3n), with 

pbnfF 


E ( A) — pbn ~ 2 f 


n 


rrin 

n 


2 


(76) 


Based on this expression of E , we will now use an exponential tail bound to bound the 
probability of the event {maxi<j<„ A* > iin}. Recall that Using the union bound. 


we have that 


max Ai > I = 

l<i<n 


i>Un, for some i) 
< nP(Ai > Un) 


< nP f A > 


Ur 


= nP ( A > 


(b) 


2E(A)) 


< nexp ( — ^E(A)J 

, p bnl3n 

= nexp -- 

bp n 

^ I P 

< n exp — - 


6 n 

p 320 nlnn /3„ 

= —'n 

/ 160, 

< n exp I —^ In n 


< n 


-3 


(77) 


(78) 


Step (a) follows from Eq. (76). Step (6) follows from the following multiplicative form of the 
Chernoff bound (cf. Chapter 4 of | 23 ]), with 5 = 1: P(A > (1 + 8)p) < exp(—^;u), where A is a 
binomial random variable with E(A) = p. Step (c) follows from the assumption p G (1/2,1) (cf. 
Assumption |4.1| ), and hence 

^>P>l/2. (79) 

This completes the proof of Lemma [43) Q.E.D. 


A.8. Proof of Lemma 14.71 

Proof. For a set 5 C f, denote by Af*{S) the set of neighbors of S in G, i.e., J\f*{S) =J\f{S) n A. 


To prove Lemma 4.7, we will leverage the fact that the underlying connectivity graph, pn, is an 
expander graph with appropriate expansion. As a result, most subsets S' C F have a large set 
of neighbors, A/’(S), in pn- Because each server in A/’(S) belongs to A7*(S) independently, as a 
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consequence of our scheduling policy, we will then use a concentration inequality to show that, 
with high probability, the sizes of the sets N'*{S) remain sufficiently large. Using the union bound 
over the relevant sets S, we will finally conclude that G has the desired expansion property, with 
high probability. 

By the definition of a (y/un, ti„)-expander, we are only interested in the expansion of subsets 
of r with size less than or equal to \T\'y/un. We first verify below that the size of such subsets S 
is sufficiently small to be able to exploit the expansion property of gn and to infer that AA*(5) is 
large. We have 

nj/Pn _ n ^ n 

~ |f I ' 


m„ 


Pn 


which is equivalent to saying 


s<'yn/l3ri, Vs<|f| 7 /u„, 


= 1 , 


(80) 


(81) 


as desired. 

For a set 5 C F, we now characterize the size of its neighborhood in G, |AA*(5')|, which depends 


on the distribution of the random subset, A. Fix some s G N with s < |r| 7 /-u„. From Eq. (81), we 


know that s < 'yn/j3n. Consider some S C f with |5| = s. Using the expansion property of Qn, we 
have that |AA(5')| >l3nS. Therefore, 


n|AA*(5)|<n„s)=p| I(jGA)<h„s 


< P ^Bino ^1AA (5) I, ^ (p + 3e/4)^ < 
<P ^Bino ^ (/9 + 3e/4)^ < UnS^ , 


(82) 


for all sufficiently large n. Step (a) follows from the assumption that P(j € A) >(p + 3e/4)^, and 
step (b) from the inequality |AA(S')| > /3„s. We observe that 

/i =E ^Bino ^/3„s , — (p + 3e/4) 

=(p + 3e/4)^s 
n 


C)/ , Q 80 nlnn 

= [p + 3e/4)-- • — 5 — /3nS 


n 


Pn 


, ,., 80 In n 

= {p + 3e/4)^—s, 


( 83 ) 


where in step (a) we used the substitution bn = ^ ■ We also have that 
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pn 

p 80 re Inre 
~Pn ~ * 


pn 
p 80 Inre 

p 


l^n 


(84) 


By combining Eqs. (83) and (84), we can derive a useful lower bound on the quantity 1 — 

I I I—I 


which is recorded in the lemma that follows. 


Lemma A.4 We have that 


1 _ ^ 1 
pL ~ 2 


(85) 


Proof. Using Eqs. (83) and (84) in the first step below, we have that 

^ ^ ^_ p 

p p{p + 3e/A)' 

Recall that e = (1 — p)/2, so that p = 1 — 2e and that p = l/(l + e/4). Using these substitutions, we 
obtain 

^ ^ _ (l-2e)(l + e/4) 

p 1 —2e + 3e/4 

_3e/4-e/4 + 2e74 
“ l-5e/4 

e(l + e)/2 


1 - 5e/4 


e 

>-. 

“2 


Q.E.D. 


To obtain an upper bound for the probability in Eq. (82), we substitute Eqs. (83) and (85) into 


Eq. (82). Given the assumption that s<7re//3„, we have that 

P(|AA*(5')| < Uns) <P ^Bino ^ (p + 3e/4)^ 


< u„.s 


(“) / 1 

(b) ( 801nre 

= exp (• ^^(p + 3e/4)s 

=exp(—(101nre)(/3 + 3e/4)s) 

(c) 

< exp(—(51nre)s) 

1 

rp>s 


( 86 ) 
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for all sufficiently large n. Step (a) is based on a multiplicative form of the Chernoff bound 
(cf. Chapter 4 of [SS]), IP(^< {1 — S)fi) < exp(—where X is a binomial random variable 
with E (X) = /X, and 

S = l- — >e/2, (87) 


where the last inequality follows from Lemma A.4, Step (6) follows from Eq. (83), and (c) from the 
assumption that p> 1/2. 


We now apply Eq. (86) to subsets of E, and use the union bound. We have, for all sufficiently 
large n, that 

G is not a {'y/u„ ,u„)-expander'j <P(3S' C f such that: 151 < |f and |A^*(5)| < 'U„|5|) 


(a 


|f|7/i 


®=1 \Scf.|S|=s 

|f|7/i 




in 


r{W*{S)\<UnS) 


^ E 

|r|7/^n 

< 6^P(|AA*(5)|<n„s) 

S=1 


o' 

< 


E 


S=1 

oo 


< 


T.V’-/ 

S^l 

Kfn^ 


5 \s 


( 88 ) 


1 - ■ 

Step (a) is the union bound. In step (6), we used the bound (2) < and the fact that |r| = = 


bn <bn- Step (c) follows from Eq. (86). Because /?„ ^ Inn, we have that <C n, and hence 


bn 1 
— < —, 
n® n^ 


for all sufficiently large n. Combining Eqs. (88) and (89), we conclude that 


G is not a ( , n„ ) -expander ) < —, 


(89) 


(90) 


for all sufficiently large n. This proves our claim. Q.E.D. 


Appendix B: Expanded Modular Architectures 

In this appendix, we start by describing the graph product, and subsequently we discuss the 
implications of using an expander graph. 
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Construction of the Architecture. We first express the average degree as a product, = 
d™ ■ where the relative magnitudes of d™ and d® are a design choice. The architecture is 
constructed as follows. 

1. Similar to the case of the Modular architecture, partition I and J into equal-sized clusters of 
size d™. We will refer to the index set of the queue and server clusters as Q and S, respectively. 
For any i G / and j G J, denote by q{i) G Q and s{j) G 5, the indices of the queue and server 
clusters to which i and j belong, respectively. 

2. Let be a bipartite graph of maximum degree d® whose left and right nodes are the queue 
and server clusters, Q and 5, respectively. Let be the set of edges of 5 ®. 

3. To construct the interconnection topology = (/ U J^E), let (i,j) G FI if and only if their 
corresponding queue and server clusters are connected in i.e., if {q{i),s{j)) G E^. 

Note that by the above construction, each queue is connected to at most d® server clusters 
through 5 ®, and within each connected cluster, to d™ servers. Therefore, the maximum degree of 
gr, is d™ • d® = d„. 

Scheduling Policy. The scheduling policy requires the knowledge of the arrival rate vector, 
A„, and involves two stages. For a given A„, the computation in the hrst stage is performed only 
once, while the steps in the second stage are repeated throughout the operation of the system. 

1 . Compute a feasible flow, {fq,s}iq,s)eE<=, over the graph 5 ®, where the incoming flow at each 

queue cluster 5 G Q is equal to the outgoing flow at each server cluster s G 5 is 

constrained to be less than or equal to (It turns out that, under our assumptions, such 

a feasible flow exists |33j.) Denote by fq^s the total rate of flow from the queue cluster q to 
the server cluster s. 

2. Arriving jobs first wait in queue until they are fetched by a server. When a server becomes 
available, it chooses a neighboring queue cluster (w.r.t. the topology of 5 ®) with probability 
roughly proportional to the flow between the clusters. In particular, a server in cluster s 
chooses the queue cluster q with probability 


Pe,q 


fl.s 


l+p 


+ 


1 


1-p 


(91) 


Eq>^Mis}f<i',s 2 deg(s) 2 
where deg(s) is the degree of s in 5 ®. Within the chosen cluster, the server starts serving a 
job from an arbitrary non-empty queue, or, if all queues in the cluster are empty, the server 
initiates an idling period whose length is exponentially distributed with mean 1 . 

When the graph 5 ® is an expander graph, we refer to the topology created via the above procedure 
as an Expanded Modular architecture generated by ( 7 ®. 

Note that an Expanded Modular architecture is constructed as a “product” between an expander 
graph across the queue and server clusters, and a fully connected graph for each pair of connected 
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clusters. As a result, its performance is also of a hybrid nature: the expansion properties of 
guarantee a large capacity region, while a diminishing delay is obtained as a result of the growing 
size of the server and queue clusters. We summarize this in the next theorem. Here we assume that 


d® is sufficiently large so that the expander graph described in Lemma 3.3 exists. The reader is 
referred to Section 3.4.5 of |33] for the proof of the theorem (although with different choices for 
some of the constants). 


Theorem B.l (Capacity and Delay of Expanded Modular Architectures) Suppose that 
d„ = d™ • d® . Let 7 = ^/p and /3„ = | • d® . Let be a {'^/jSn, I3n)-expander with maximum 

degree d® , and let gn be an Expanded Modular architecture generated by g^. Lf 




l + ln(l/p) 


(92) 


then, under the scheduling policy described above, we have that 


sup E(W|A„)< 

XnGA.fi{un) 


c 

Am 


where c is a constant that does not depend on n. 


(93) 


A Tradeoff between the Size of the Capacity Region and the Delay. For the Expanded Modular 
architecture, the relative values of d™ and d® reflect a design choice: a larger value of d® ensures 
a larger capacity region, while a larger value of d™ yields smaller delays. Therefore, while the 
Expanded Modular architecture is able to provide a strong delay guarantee that applies to all 
arrival rate vectors in h comes at the expense of either a slower rate of diminishing delay 

(small d™) or a smaller capacity region (small d®). 








